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Fairfax,  VA  22030  USA 

Abstract 

In  this  paper,  wc  present  some  graphical  techniques  for  cluster 
analysis  of  high-dimensional  data.  Parallel  coordinate  plots  and  par¬ 
allel  coordinate  density  plots  arc  graphical  techniques  which  map 
multivariate  data  into  a  two-dimensional  display.  The  method  has 
some  elegant  duality  properties  with  ordinary  Cartesian  plots  so  that 
higher-dimensional  mathematical  structures  can  be  analyzed.  Our 
high  interaction  software  allows  for  rapid  editing  of  data  to  remove 
outliers  and  isolate  clusters  by  brushing.  Our  brushing  techniques  al¬ 
low  not  only  for  hue  adjustment,  but  also  for  satiuration  adjustment. 
Saturation  adjustment  allows  for  the  handling  of  comparatively  mas¬ 
sive  data  sets  by  using  the  a-channcl  of  the  Silicon  Graphics  work¬ 
station  to  compensate  for  heavy  ovcrplotting. 

The  grand  tour  is  a  generalized  rotation  of  coordinate  axes  in  a 
high-dimensional  space.  Coupled  with  the  full-dimensional  plots  al¬ 
lowed  by  the  parallel  coordinate  display,  these  techniques  allow  the 
data  analyst  to  explore  data  which  is  both  high-dimensional  and  mas¬ 
sive  in  size.  In  this  paper  wc  give  a  description  of  both  techniques 
and  illustrate  their  use  to  do  inverse  regression  and  clustering.  Wc 
have  used  these  techniques  to  analyze  data  on  the  order  of  250,000 
observations  in  8  dimensions.  Because  the  analysis  requires  the  use 
of  color  graphics,  in  the  present  paper  wc  illustrate  the  methods  with 
a  more  modest  data  set  of  3848  observations.  Other  illustrations  arc 
available  on  our  web  page. 

1.  Introduction 

Visualization  of  high-dimensional,  multivariate  data  has  enjoyed  a  consider¬ 
able  development  with  the  introduction  of  the  grand  tour  by  Asimov  (1985) 
and  Buja  and  Asimov  (1985)  and  the  parallel  coordinate  display  by  Inselberg 
(1985)  and  Wegman  (1990).  The  former  technique  is  an  animation  method¬ 
ology  for  viewing  two-dimensional  projections  of  general  d-dimensional  data 
where  the  animation  is  determined  by  a  space-filling  curve  through  all  possi¬ 
ble  orientations  of  a  two-dimensional  coordinate  system  in  d-space.  Viewed 
as  a  function  of  time,  the  grand-tour  animation  reveals  interesting  projec¬ 
tions  of  the  data,  projections  that  reveal  underlying  structure.  This  in  turn 
allows  for  the  construction  of  models  of  data’s  underlying  structure. 

The  parallel  coordinate  display  is  in  many  senses  a  generalization  of  a  two- 
dimensional  Cartesian  plot.  The  idea  is  to  sacrifice  orthogonal  axes  by  draw¬ 
ing  the  axes  parallel  to  each  other  in  order  to  obtain  a  planar  diagram  in 


which  each  cJ-dimensional  point  has  a  unique  representation.  Because  of  ele¬ 
gant  duality  properties,  parallel  coordinate  displays  allow  interpretations  of 
statistical  data  in  a  manner  quite  analogous  to  two-dimensional  Cartesian 
scatter  plots.  Wegman  (1991)  formulated  a  general  d-dimensional  forin  of 
the  grand  tour  and  suggested  using  the  parallel  coordinate  plot  as  a  visu¬ 
alization  tool  for  the  general  d-dimensional  animation.  We  have  found  this 
combination  of  multivariate  visualization  tools  to  be  extraordinarily  effec¬ 
tive  in  the  exploration  of  multivariate  data.  In  Section  2,  we  briefly  describe 
parallel  coordinate  displays  including  interpretation  of  parallel  coordinate 
displays  for  detecting  clusters.  In  Section  3,  we  describe  the  generalized 
d-dimensional  grand  tour  and  a  partial  sub-dimensional  grand  tour.  In  Sec¬ 
tion  4,  we  discuss  brushing  with  hue  and  saturation  including  a  discussion  of 
perceptual  considerations  for  visual  presentation.  Finally  we  close  in  Section 
5  with  a  sequence  of  illustrations  of  the  use  of  these  techniques  to  remove 
noise  and  isolate  clusters  in  a  flve-dimensional  data  set. 

2.  Parallel  Coordinate  and  Parallel  Coordinate  Den¬ 
sity  Plots 

The  parallel  coordinate  plot  is  a  geometric  device  for  displaying  points  in 
high-  dimensional  spaces,  in  particular,  for  dimensions  above  three.  As  such, 
it  is  a  graphical  alternative  to  the  conventional  scatterplot.  The  parallel  co¬ 
ordinate  density  plot  is  closely  related  and  addresses  the  situation  in  which 
there  would  be  heavy  overplotting.  In  this  circumstance,  the  parallel  coor¬ 
dinate  plot  is  replaced  with  its  density  and  so  is  much  more  appropriate  for 
very  large,  high-dimensional  data  sets.  In  place  of  the  conventional  scatter 
plot  which  tries  to  preserve  orthogonality  of  the  d-dimensional  coordinate 
axes,  draw  the  axes  as  parallel.  A  vector  (.ti,.T2,  . . .  ,a:j)  is  plotted  by  plot¬ 
ting  rci  on  axis  1,  0:2  on  axis  2  and  so  on  through  xa  on  axis  d.  The  points 
plotted  in  this  manner  are  joined  by  a  broken  line.  The  principal  advantage 
of  this  plotting  device  is  that  each  vector  (.ti,X2,.  . .  ,Xd)  is  represented  in 
a  planar  diagram  in  which  each  vector  component  has  essentially  the  same 
representation. 

The  parallel  coordinate  representation  enjoys  some  elegant  duality  properties 
with  the  usual  Cartesian  orthogonal  coordinate  representation.  Consider  a 
line  C  in  the  Cartesian  coordinate  plane  given  by  £:  y  =  mx+b  and  consider 
two  points  lying  on  that  line,  say  {a,ma  +  b)  and  (c,mc-t-6).  Superimpose  a 
Cartesian  coordinate  axes  t,  u  on  the  xy  parallel  axes  so  that  the  y  parallel 
axis  has  the  equation  w  =  1.  The  point  [a,ma  -1-  b)  in  the  xy  Cartesian 
system  maps  into  the  line  joining  (a,  0)  to  (ma  -|-  6, 1)  in  the  tu  coordinate 
axes.  Similarly,  (c,  me  -f  b)  maps  into  the  line  joining  (c,  0)  to  (me  4-  6, 1) .  A 
straightforward  computation  shows  that  these  tv’o  lines  intersect  at  a  point 
(in  the  tu  plane)  given  by  £  :  (6(1  -  m)-\  (1  -  m)"^).  This  point  in  the 
parallel  coordinate  plot  depends  only  on  m  and  6,  the  parameters  of  the 
original  line  in  the  Cartesian  plot.  Thus  /!■  is  the  dual  of  C,  and  one  has 
the  interesting  duality  result  that  points  in  Cartesian  coordinates  map  into 


lines  in  parallel  coordinates  while  lines  in  Cartesian  coordinates  map  into 
points  in  parallel  coordinates.  This  duality  is  discussed  in  further  detail  in 
Wegman  (1990). 

The  point-line,  line-point  duality  seen  in  the  transformation  from  Cartesian 
to  parallel  coordinates  extends  to  conic  sections.  The  most  significant  of 
these  dualities  from  a  statistical  point  of  view  is  that  an  ellipse  in  Cartesian 
coordinates  maps  into  a  hyperbola  in  parallel  coordinates.  A  distribution 
which  has  ellipsoidal  level  sets  would  have  hyperbolic  level  sets  in  the  paral¬ 
lel  coordinate  presentation.  It  should  be  noted  that  the  quadratic  form  does 
not  describe  a  locus  of  points,  but  a  locus  of  lines,  a  line  conic.  The  notion 
of  a  line  conic  is,  perhaps,  a  strange  notion.  By  this  is  meant  a  locus  of  lines 
whose  coordinates  satisfy  the  equation  for  a  conic.  These  may  be  more  easily 
related  to  the  usual  notion  of  a  conic  when  it  is  realized  that  the  envelope 
of  this  line  conic  is  a  point  conic.  As  mentioned  there  is  a  duality  between 
points  and  lines  and  between  conics  and  conics.  It  is  worthwhile  to  point 
out  two  other  nice  dualities.  Rotations  in  Cartesian  coordinates  become 
translations  in  parallel  coordinates  and  vice  versa.  Perhaps  more  interest¬ 
ing  from  a  statistical  point  of  view  is  that  points  of  inflection  in  Cartesian 
space  become  cusps  in  parallel  coordinate  space  and  vice  versa.  Thus  the 
relatively  hard-to-detect  inflection  point  property  of  a  function  becomes  the 
notably  more  easy  to  detect  cusp  in  the  parallel  coordinate  representation. 
Inselberg  (1985)  discusses  these  properties  in  detail. 

Since  ellipses  map  into  hyperbolas,  one  has  an  easy  template  for  diagnosing 
uncorrelated  data  pairs.  With  a  completely  uncorrelated  data  set,  one  would 
expect  the  2-dimensional  scatter  diagram  to  fill  substantially  a  circumscrib¬ 
ing  circle.  The  parallel  coordinate  plot  would  approximate  a  figure  with  a 
hyperbolic  envelope.  As  the  correlation  approaches  negative  one,  the  hyper¬ 
bolic  envelope  would  deepen  so  that  in  the  limit  one  would  have  a  pencil  of 
lines,  what  is  called  by  Wegman  (1990)  the  cross-over  effect. 

Most  importantly  for  the  present  paper,  it  should  be  noted  that  clustering  is 
easily  diagnosed  using  the  parallel  coordinate  representation.  The  individual 
parallel  coordinate  axes  represent  one-dimensional  projections  of  the  data. 
Thus,  separation  between  or  among  sets  of  data  on  any  one  axis  or  between 
any  pair  of  axes  represents  a  view  of  the  data  which  isolates  clusters.  An 
elementary  view  of  this  idea  is  seen  in  Figure  1,  where  we  illustrate  the  ap¬ 
pearance  of  three  distinct  clusters  in  a  four  dimensional  space.  Because  of 
the  connectedness  of  the  multidimensional  parallel  coordinate  diagram,  it  is 
usually  easy  to  see  whether  or  not  this  clustering  propagates  through  other 
dimensions. 

Some  of  the  data  analysis  features  of  the  parallel  coordinate  representation 
include  the  ability  to  diagnose  one-dimensional  features  such  as  marginal 
densities,  two-dimensional  features  such  as  correlations  and  nonlinear 
structures,  and  multi-dimensional  features  such  as  clustering,  hyper¬ 
planes,  and  the  modes.  These  interpretations  are  discussed  in  more  detail 
in  Wegman  (1990)  while  parallel  coordinate  density  plots  are  discussed  in 


Miller  and  Wegman  (1991). 

3.  The  Grand  Tour  Algorithm  in  d-space 

Let  Sj  =  (0,  0,  . . 0,  1,0,.. .,  0)  be  the  canonical  basis  vector  of  length  d. 
The  1  is  in  the  position.  The  ej  are  the  unit  vectors  for  each  coordinate 
axis  in  the  initial  position.  We  want  to  do  a  general  rigid  rotation  of  these 
axes  to  a  new  position  with  basis  vectors  aj{t)  =  (ai(t),  . . .,  cd^it)), 

w'here,  of  course,  Hs  a  time  index.  The  strategy  then  is  to  take  the  inner 
product  of  each  data  point,  say  a;,-,  i  =  1, . . .  ,n  with  the  basis  vectors,  aj{t). 
The  j  subscript  on  aj{t)  means  that  aj{t)  is  the  image  under  the  generalized 
rotation  of  the  canonical  basis  vector  €j.  Thus  the  data  vector  Xi  is 
x'2 . xii),  so  that  the  representation  of  Xi  in  the  aj  coordinate  system  is 

yi{i)  =  (yK^)) •  •  •  )2/rf(^))j  i  =  1, •  •  • 

W'here 

j  =  1 . d  and  i  —  1, . . . ,  n.  The  vector  yi{t)  is  then  the  linear  combination 

of  basis  vectors  representing  the  data  point  in  the  rotated  coordinate 
system  at  time  t. 

The  goal  thus  is  to  find  a  generalized  rotation,  Q,  such  that  Q{ej)  =  aj.  We 
can  think  of  Q  as  either  a  function  or  as  a  matrix  Q  w'here  ejXQ  =  aj.  We 
implement  this  by  choosing  Q  as  an  element  of  the  special  orthogonal  group 
denoted  by  SO(d)  of  orthogonal  dxd  matrices  having  determinant  of  +1. 
In  order  to  find  a  continuous,  space-filling  path  through  the  Grassmannian 
manifold  of  d-flats,  w’e  must  find  a  continuous,  space-filling  path  through  the 

SO(d). 

In  general  d-dimensional  space,  there  are  d  —  2  axes  orthogonal  to  each  tw’o- 
fiat.  Thus  rather  than  rotating  around  an  axis  as  w-e  are  used  to  in  ordinary 
three-dimensional  space,  w’e  must  rotate  in  a  plane  in  d-dimensional  space. 
The  generalized  rotation  matrix,  Q,  is  built  up  from  a  series  of  rotations 
in  individual  two-fiats.  In  d-space,  there  are  d  canonical  basis  vectors 

and,  thus,  =  ^d^  -  d)  distinct  tw'o-flats  formed  by  the  canonical  basis 

vectors.  We  let  Rij(9^  be  the  element  of  SO(d)  w'hich  rotates  in  the  CiCj 
plane  through  an  angle  of  9.  We  define  Q  by 

^1,3,  •  •  • ,  Oa-u)  =  -^12(^1, 2)  X  •  •  •  X  Rd-iA^d-u)' 

There  are  p  =  |(d^  -  d)  factors.  The  restrictions  on  9ij  are  0  <  6ij  <  27r, 
l<i<j<d.  The  vector  (di,2,  ^1,3,. . .  ,0d-iA  thought  of  as 

a  point  on  a  p-dimensional  torus.  This  is  the  origin  of  the  description  of 


this  method  as  the  torus  method, 
matrices  given  by 

■  1  0 
0  • •  •  cos{6) 
0  •  •  •  sin(6) 

0  •••  0 


The  individual  factors  Rij{6)  are  d  x  d 

0  -..O' 

•  •  •  —sin{6)  •  0 

cos{d)  •••  0 

0  •••  1 


where  the  cosine  and  sine  entries  are  in  the  and  j’’'*  columns  and  rows. 

The  final  step  in  the  algorithm  is  to  describe  a  space  filling  path  on  the 
p-dimensional  torus,  T^.  This  can  be  done  by  a  mapping  a:  R  — >  given 

by 


a{t)  =  (Alt,  A2f, . . . ,  Apt) 

where  Ai,  . . .,  Ap  is  a  sequence  of  mutually  irrational  real  numbers  and  the 
A,t  are  interpreted  modulo  27r.  The  composition  of  a  with  Q  will  describe 
a  space  filling  path  in  SO(d).  Thus  our  final  algorithm  is  given  by 

~  ^  •  •  •  j 

The  canonical  unit  vector  for  each  coordinate  axis  at  time  t  described  by  the 
grand  tour  algorithm  is  an  orthogonal  linear  combination,  aj  (t)  =  Cj  xQ{t), 
of  the  original  unit  vectors.  This  has  several  important  implications  for  the 
utility  of  this  methodology.  First,  it  should  be  immediately  clear  that  one 
can  do  a  grand  tour  on  any  subset  of  the  original  coordinate  axes  simply 
by  fixing  the  appropriate  two-planes  in  the  rotation  matrix  given  above  by 
Q.  That  is,  if  we  wish  the  variable  to  not  be  included  in  the  grand 
tour  rotation,  we  simply  put  a  1  in  the  Qjj{t)  entry  with  0  in  the  remaining 
positions  in  the  row  and  the  column.  Thus  it  is  straightforward  to 
do  a  partial  grand  tour.  The  interest  in  doing  a  partial  grand  tour  will  be 
discussed  in  the  next  section. 

The  second  important  implication  relates  to  the  connection  with  the  paral¬ 
lel  coordinate  display.  An  immediate  concern  with  the  parallel  coordinate 
display  is  the  preferential  ordering  of  the  axes.  In  our  discussion  above  we 
indicated  that  the  axis  for  variable  one  is  adjacent  to  the  axis  for  variable 
two,  but  not  for  variable  three.  In  general  the  axis  for  variable  j  is  adjacent 
to  the  axes  for  variables  j  —  1  and  j  +  1  but  for  no  other  axes.  It  is  easy 
to  see  pair  wise  relationships  for  adjacent  variables,  but  less  easy  for  non- 
adjacent  variables.  Wegman  (1990)  has  a  substantial  discussion  on  methods 
for  considering  all  possible  permutations.  This  concern  is  immaterial  when 
one  does  the  grand  tour  since  eventually  aj{t)  =  ej  x  Q{t)  =  e,-,  for  every 
i.  Thus  eventually  every  possible  permutation  of  the  axes  will  appear  in  the 


grand  tour. 

4.  Some  Additional  Visualization  Devices 

4.1  Brushing  with  Hue  and  Saturation 

A  powerful  method  in  high  interaction  graphics  is  the  brushing  technique. 
The  idea  is  to  isolate  clusters  or  other  interesting  subsets  of  a  data  set  by, 
in  effect,  painting  that  subset  with  a  color.  This  is  usually  done  in  two  set¬ 
tings:  ij  with  co-plots  and  2)  with  animations.  The  brushed  color  becomes 
an  attribute  of  the  data  point  and  is  maintained  in  all  representations.  The 
idea  of  co-plots  is  that  a  particular  data  set  may  be  presented  in  more  that 
one  way,  for  example  in  a  scatter  plot  matrix  or  say  with  a  scatter  plot,  a 
histogram  and  a  dot  plot.  Points  colored  the  same  way  in  all  presentations 
allow  the  data  analyst  the  ability  to  track  coherent  clusters  or  subsets  of  the 
data  through  different  representations.  Of  course,  with  an  animation,  the 
coloring  allows  the  data  analyst  to  follows  clusters  or  subsets  of  the  data 
through  the  time  evolution  of  the  animation. 

In  general,  colors  may  assume  of  range  of  saturations  depending  on  the  rel¬ 
ative  proportion  of  gray  to  chroma.  We  implement  the  following  device. 
We  de-saturate  the  hue  with  black  so  that  the  brushing  color  is  nearly  black 
when  desaturated.  This  by  itself  would  not  be  fully  useful.  However, 
when  points  are  overplotted,  we  add  the  hue  components.  Say,  for  example, 
we  use  an  approximate  one  and  one  half  percent  hue  component  of  blue. 
This  would  mean  (on  an  eight  bit  scale)  approximately  2  bits  of  blue  and 
0  bits  each  of  red  and  green.  Thus  if  approximately  67  observations  were 
overplotted  at  a  given  pixel,  that  pixel  would  be  fully  saturated  with  blue. 
Fewer  observations  mean  a  less  saturated  color.  The  level  of  saturation 
of  the  brushing  color  is  controllable  by  the  user.  Larger  data  sets  suggest 
lower  saturation  levels.  The  level  of  saturation  thus  reflects  the  degree  of 
overplotting.  This  device  is  in  essence  a  way  of  creating  a  parallel  coordi¬ 
nate  (or  any  other  kind  of)  density  plot.  (See  Miller  and  Wegman,  1991). 
The  advantage  of  this  technique  for  creating  a  density  plot  is  that  it  does 
not  depend  on  smoothing  algorithms  so  that  individual  data  points  are  still 
resolvable. 

The  addition  of  saturations  is  implementable  in  hardware  on  Silicon  Graph¬ 
ics  workstations  by  means  of  the  a-channel.  The  a-channel  is  a  hardware 
device  for  blending  pixel  intensities  and  has  its  primary  use  for  transparency 
algorithms.  However,  by  blending  pixels  intensities  of  the  sarne  color,  we 
can  in  effect  add  the  pixel  intensities  and  achieve  brushing  with  hue  and 
saturation  with  no  speed  penalty  whatsoever.  This  technique  is  incredibly 
powerful  in  resolving  structure  in  large  data  sets  with  heavy  overplotting  as 
we  hope  to  illustrate  in  the  next  section. 


4.2  Some  Perceptual  Issues. 


Brushing  with  hue  and  saturation  leads  to  an  interesting  question  concerning 
perception  of  the  resulting  plots.  When  viewed  against  a  black  background, 
the  low  saturation  observations,  i.e.  those  that  are  not  heavily  overplotted, 
blend  with  that  background.  This  is  quite  useful  when  trying  to  under¬ 
stand  the  internal  structure  of  the  high  density  regions  of  the  plot.  Our 
usual  technique  is  to  brush  with  a  white  (actually  a  very  dark  gray)  color. 
Then  the  internal  structure  appears  as  white  in  the  highest  density  regions 
as  illustrated  in  Figure  4.  The  resulting  plot  looks  rather  like  an  x-ray  of 
the  internal  structure  of  the  data  set. 

When  viewed  against  a  white  background,  the  low  saturation  level  obser¬ 
vations  are  nearly  black  and  so  are  quite  visible.  This  is  extremely  useful 
when  looking  for  outliers,  which  would  tend  to  be  invisible  against  the  black 
background.  The  white  background  is  also  extremely  useful  for  data  edit¬ 
ing.  Our  implementation  on  the  Silicon  Graphics  workstations  supports  a 
scissors  feature  so  that  we  can  prune  away  low  density  regions.  This  feature 
allows  for  rapid  visual  data  editing  which  may  be  useful  for  eliminating  out¬ 
liers,  transcription  errors,  and  data  with  missing  values  that  could  impair 
the  ability  to  reach  sensible  conclusions.  Obviously,  when  using  a  white 
background,  it  is  important  to  brush  with  some  hue,  since  when  brushed 
with  a  gray  the  result  would  be  that  highest  density  regions  would  be  white 
again  blending  with  the  background.  Because  the  apparent  brightness  of 
normal  hues  (red,  blue,  green,  etc.)  is  lower  than  the  apparent  brightness 
of  white,  the  internal  structure  of  the  data  set  is  less  apparent  with  a  white 
background  than  with  a  black  background.  Thus  it  is  clear  that  both  back¬ 
grounds  have  their  utility  depending  on  the  task  at  hand. 

4.3  Visual  Regression  and  Clustering  Decision  Rules. 

The  combination  of  hue  and  saturation  brushing  and  the  partial  grand  tour 
creates  a  device  for  visual  regression  and  clustering  decision  rules.  Consider 
a  response  variable  of  interest,  let  us  say,  for  example,  profit  in  a  financial 
setting.  Let  us  suppose  we  wish  to  answer  the  following  question,  “What 
combination  of  customer  demographics  variables  is  likely  to  cause  the  corpo¬ 
ration  to  lose  money?”  We  brush  the  profit  variable  as  follows:  for  negative 
profits,  we  brush  the  observations  red,  for  positive  profits  we  brush  them 
green.  Where  the  variables  overlap,  the  combination  of  red  and  green  sum 
to  yellow.  Where  there  are  observations  primarily  leading  to  losses,  the 
result  will  be  generally  red  and  where  profits,  primarily  green.  Since  we  are 
interested  in  the  covariates  leading  to  profit,  we  fix  the  profit  variable  so  that 
it  does  not  enter  into  the  partial  grand  tour  rotation.  We  may  rotate  on 
any  combination  of  explanatory  covariates  we  wish.  For  example,  we  may 
have  data  on  customer’s  average  account  balance,  sex,  race,  age  and  annual 
income.  While  all  of  these  may  affect  the  profitability  of  the  corporation. 


the  prohibitions  against  discrimination  on  the  basis  of  race  and  sex  would 
lead  us  to  generate  decision  rules  which  do  not  consider  these  factors.  Sim¬ 
ilarly,  some  data  may  be  extremely  difficult  or  expensive  to  collect.  Thus 
while  it  may  be  an  extremely  helpful  covariate,  it  may  be  missing  so  often 
that  its  value  is  substantially  diminished  in  forming  decision  rules. 

The  partial  grand  tour  is  done  on  the  explanatory  covariates  of  interest  while 
keeping  the  response  variable  and  any  other  explanatory  covariates  that  we 
wish  to  exclude  fixed.  Because  the  grand  tour  automatically  forms  orthog¬ 
onal  linear  combinations  of  desired  explanatory  covariates,  the  color  coding 
allows  us  in  effect  to  see  the  response  variable  in  terms  of  the  orthogonal 
linear  combinations  of  the  explanatory  variables.  Thus  when  we  see  a  linear 
combination  of  explanatory  variables  that  is  intensely  red  in  our  example, 
we  know  that  this  is  a  combination  of  variables  which  leads  to  a  negative 
profit.  We  can  thus  isolate  the  range  of  the  linear  combination  of  covariates 
that  is  colored  red  and  this  will  be  a  component  of  the  decision  tree  in  terms 
of  demographic  variables  that  causes  the  organization  to  lose  money.  We 
can  then  edit  that  particular  cluster  of  observations  from  our  data  set  and 
resume  the  partial  grand  tour.  Repeating  this  process  recursively  allows 
us  to  determine  a  sequence  of  decision  rules  that  isolate  customers  likely  to 
cause  financial  loss  to  the  organization.  Because  this  methodology  is  so 
intensively  dependent  on  color,  it  is  not  possible  to  easily  illustrate  these 
techniques  in  this  paper.  However,  we  have  included  an  example  based  on 
this  idea  as  well  as  other  examples  in  our  web  server  at  URL 

http  :  / j WWW. galaxy .gmu.edu/images f gallery f research-ar cade.html . 

This  method  thus  leads  to  a  high  interaction  techniques  for  rapidly  identi¬ 
fying  a  decision  rule  based  on  visual  display.  The  rules  are  sophisticated  in 
the  sense  that  they  need  not  be  simple  binary  decision  rules  and  they  need 
not  be  based  on  simply  the  original  covariates. 

5.  An  Example 

In  this  example  we  would  like  to  consider  a  synthetic  dataset  about  the 
geometric  features  of  pollen  grains.  There  are  3848  observations  on  5  vari¬ 
ables.  This  data  is  the  1986  ASA  Data  Exposition  dataset,  made  up  by 
David  Coleman  of  RCA  Labs.  The  data  set  is  available  from  STALIB  at 
URL=http://\^’ww.stat. cmu.edu/datasets/.  Figure  2  is  the  scatterplot  ma¬ 
trix  for  this  five-dimensional  data.  Note  that  in  all  presentations  the  data 
appear  to  have  elliptical  contours.  This  is  true  even  when  all  five  variables 
are  rotated  through  the  grand  tour.  This  is  suggestive  of  the  fact  that 
the  point  cloud  is  sampled  from  a  five-dimensional  multivariate  distribution 
with  ellipsoidal  level  sets,  perhaps  a  multivariate  normal.  Figure  3  is  the 
corresponding  parallel  coordinate  display  with  pure  black  on  a  white  back¬ 
ground.  In  this  display  each  of  the  five  variables  have  been  rescaled  so  as  to 
fill  the  parallel  coordinate  axes.  Note  that  in  the  parallel  coordinate  display. 


the  variables  exhibit  hyperbolic  envelopes,  the  dual  of  elliptical  contours  in 
Cartesian  plots.  This  confirms  our  observation  of  the  ellipsoidal  level  sets. 

Figure  4  represents  a  highly  desaturated  version  of  the  same  parallel  coor¬ 
dinate  plot,  this  time  white  on  a  black  background.  With  this  desaturated 
view,  it  is  clear  that  there  is  an  interesting  internal  structure  buried  in  the 
noise.  Figure  5  represents  a  partially  pruned  view  with  much  of  the  noise 
removed.  Figure  6  is  the  result  of  a  second  pruning  edit  in  which  the  in¬ 
ternal  structure  is  fully  revealed.  In  both  Figures  5  and  6  the  data  have 
been  rescaled  to  fill  the  axes.  Figure  7  represents  the  results  of  grand  tour 
in  which  it  is  clear  from  the  gaps  seen  in  axes  two,  three  and  four  that  this 
data  forms  six  clusters  separable  in  at  least  three  dimensions  of  the  five. 
Our  software  permits  this  edit  to  be  accomplished  in  less  than  three  min¬ 
utes.  Figure  8  displays  the  edited  data  in  a  scatter  plot  display.  The  99 
remaining  points  from  the  original  3848  points  are  perfectly  isolated  from 
the  noise  and  spell  the  word  EUREKA.  The  six  letters  are  of  course  the 
six  clusters  isolated  in  Figure  7.  The  99  points  are  only  about  2.7%  of  the 
data  set  and  yet  we  were  able  to  isolate  these  points  in  six  clusters  using  the 
techniques  described  in  this  paper. 
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Legends  for  Figures 

Figure  1.  a.  Scatterplot  matrix  of  three  clusters  in  four  dimensions,  b.  Par¬ 
allel  coordinate  plot  corresponding  to  the  scatterplot  matrix  in  l.a.  Note 
that  a  separation  along  any  axis  or  in  between  axes  is  indicative  of  a  cluster. 
Note  also  that  distinctive  slopes  of  the  line  segments  between  pairs  of  axes 
also  separate  clusters. 

Figure  2.  The  scatterplot  matrix  of  3848  observations  on  5  variables  from 
a  synthetic  dataset  about  the  geometric  features  of  pollen  grains.  The  level 
sets  appear  to  be  elliptical  in  all  five  dimensions  suggesting  a  five-dimensional 
ellipsoidal  shape.  One  might  be  tempted  to  guess  multivariate  Gaussianity. 

Figure  3.  The  fully  saturated  parallel  coordinate  plot  of  the  same  3848 
observations  in  five  space.  The  hyperbolic  envelope  tends  to  confirm  the 
conclusions  about  a  five  dimensional  ellipsoidal  level  set.  However,  little 
can  be  seen  from  either  Figure  2  or  Figure  3  about  the  internal  structure  of 
this  data. 

Figure  4.  The  desaturated  parallel  coordinate  plot  of  the  3848  observations 
this  time  plotted  on  a  black  background.  Notice  the  internal  structure  and 
the  x-ray  like  appearance  of  this  density  plot. 

Figure  5.  An  intermediate  parallel  coordinate  plot  pruned  to  remove  ob¬ 
servations  away  from  the  internal  structure.  The  plot  is  rescaled  to  fill  the 
same  scale  as  in  Figure  4. 

Figure  6.  The  final  pruned  parallel  coordinate  plot  with  all  observations 
removed  except  those  corresponding  to  the  internal  structure.  The  plot  is 
again  rescaled.  The  five  gaps  on  axes  two  and  three  are  suggestive  of  six 
clusters. 

Figure  7.  The  result  of  a  grand  tour  rotation  of  the  data  in  Figure  6.  The 
rotation  confirms  that  these  are  six  clusters  completely  separable  in  at  least 
three  of  the  five  dimensions. 

Figure  8.  The  result  of  plotting  the  data  isolated  in  the  parallel  coordinate 
display  back  into  the  scatterplot  matrix.  It  is  now  apparent  that  the  six 
clusters  for  the  letters  E  U  R  E  K  A.  The  six  letters  are  made  up  of  99  points 
of  the  3848  in  the  original  data  set,  less  than  2.7%  of  the  total  observations. 
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